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Abstract 

Parametric Binary Dissection (PBD) is a new algorithm that can 
be used for partitioning graphs embedded in 2- or 3-dimensional 
space. It partitions explicitly on the basis of nodes + \x(edges cut), 
where A is the the ratio of time to communicate over an edge to the 
time to compute at a node. The new algorithm is faster than the 
original binary dissection algorithm and attempts to obtain better 
partitions than the older algorithm, which only takes nodes into ac- 
count. 

We compare the performance of parametric dissection with plain 
binary dissection on 3 large unstructured 3-d meshes obtained from 
computational fluid dynamics and on 2 random graphs. We show that 
the new algorithm can usually yield partitions that are substantially 
superior, but that its performance is heavily dependent on the input 
data. 


* Research Supported by NASA 'Contract NAS- 1-1 9480, while the author was resident 
at the Institute for Computer Applications in Science Engineering, NASA Langley 
Research Center, Hampton, Virginia. 
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1 Introduction 


In order to fully utilize parallel computers, it is crucial to uniformly partition 
the domain over which computations are to be performed. This problem is 
known to be computationally intractable and a number of heuristics have 
been developed for its solution. 

Binary dissection or orthogonal recursive partitioning, developed in 1985 
by Berger &: Bokhari [2, 3], is a partitioning technique that is in widespread 
use[l, 5, 6]. This is a fast and straightforward algorithm that carries out 
partitioning as a series of recursive bisections that minimize the load at each 
step. This algorithm does not take communication costs into account and can 
sometimes yield partitions that have poor communicate to compute ratio. 

The solution of aerodynamic problems on unstructured meshes is an im- 
portant area of research within the field of computational fluid dynamics. 
Unstructured meshes are graphs embedded in 2- or 3-dimensional space. 
The current requirement is to solve large (% 10 5 node 10 6 edge) problems 
on parallel computers such as the Intel iPSC-860 hypercube or PARAGON 2-d 
mesh. Efficient utilization of these parallel machines requires good partition- 
ing of meshes over the processors of the system. 

When binary dissection is used for partitioning, the nodes of the problem 
mesh are uniformly distributed over all processors but, of course, no attention 
is paid to the number of edges that are cut. Each edge that is cut by the 
partitioning results in an inter-processor communication requirement. We 
normalize the time required to compute at a node to 1, and denote by A the 
time required to communicate over an edge. The normalized time required 
by a specific partitioning of a problem mesh is then equal to the maximum 
of nodes +A x ( edges cut ) over all subregions. 

Parametric Binary Dissection (PBD) [4] is a new technique that attempts 
to take communication overhead into account by partitioning on the basis of 
load as well as communication cost. At each step of the dissection, an attempt 
is made to minimize the nodes -)-A x ( edges cut ) for the two subregions. A 
fast algorithm for PBD is given in [4]. Since PBD becomes ordinary binary 
dissection when A = 0, this fast algorithm also serves to solve the original 
problem more rapidly. 

In this paper we evaluate the performance of Parametric Binary Dissec- 
tion on 3 unstructured meshes taken from aerodynamic applications. The 
meshes that we use for our evaluation are as follows. 
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Mesh 

Nodes 

Edges 

Provided by 

Wing & Pod 

106064 

697992 

Dimitri Mavriplis 

F-18 

316399 

2106889" 

Clyde Gumbert 

Wing & Store 

121200 

818066 

Neil Frink 


We also provide results for the dissection of two large random 3— d graphs. 

We evaluate PBD by applying it to each mesh for A = 0,2 , 

for depths of partitioning varying from 1 to about 18. Since we are dealing 
with binary dissection, each level of partitioning doubles the number of re- 
gions. Thus a depth 4 partitioning results in 2 4 regions and would be tar- 
geted to a 16 processor system. For each partition we obtain the maximum 
number of edges cut and the maximum number of nodes over all regions. 
The normalized run time of a partition is then max nodes +Ax (max edges 
cut). For A = 0 PBD degenerates into plain binary dissection. We can thus 
compare the performance of plain and parametric dissection by dividing the 
normalized run time at every depth for A = 0 with the run time at various 
non-zero values of A. These ratios give us the improvement of PBD over 
plain dissection and are plotted in the following Sections. 

2 Wing and Pod 

This mesh, provided by Dimitri Mavriplis, is based on half a fuselage, wing 
and an engine. It has 106064 nodes and 697992 edges. When A = 0, Paramet- 
ric dissection is the same as ordinary dissection and there is no performance 
advantage. For depths 7-11 there is degradation in performance, except for 
large values of A. For depths 3-6 and 11—15 there is performance improve- 
ment and this increases with A. The time taken for PBD to depth 16 on this 
mesh is 230 seconds on a Sparcstation-10. This time includes 67 seconds to 
input the mesh. 
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3 F-18 

The F-18 mesh was provided by Clyde Gumbert and has 316399 nodes and 
2106889 edges. The Parametric Dissection algorithm could not provide any 
performance improvement for this mesh. Careful analysis of the mesh re- 
vealed that the wings are exactly at the midpoint (in terms of nodes) of the 
domain. Thus the first two cuts tend to pass through the wings which, of 
course, contain no mesh elements. The following figure is a simplified repre- 
sentation of a slice through the mesh (grey area) with the first cut passing 
through the wing. The nose of the plane points towards the observei only 
half the plane is shown. 



To verify this conclusion, we redid the experiment after applying a rota- 
tion to the node coordinates. The rotation that we used was an arbitrarily 
chosen 79° each about the x, y and 2 axes, in that order K The performance 
on the rotated mesh is better, yielding very high performance improvements 

1 Any set of rotations large enough to move the first 2 or 3 cuts out of the body of the 
plane will suffice. 
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improvement 


for certain depths, but is not as dramatic as for the previous case. The time 
required for a depth 18 dissection of this mesh is 505 seconds on a 50-MHz 
MIPS R4000 processor; this includes 82 seconds for input. 


F-18 n=316399 e=2106889 



2 4 6 8 10 12 14 16 18 

depth 
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4 Wing and Store 

This mesh was provided by Neil Frink and has 121200 nodes and 818068 
edges. The algorithm was able to show good performance improvements for 
several values of depth for this mesh. The time required is 301 seconds for 
depth 15 (88 seconds input time) on a Sparcstation-10. 

Wing & Store n=121200 e=818068 
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4.1 Wing and Store: modified algorithm 

Careful examination of the plots presented so far will show that there is no 
performance improvement for depth 1. This is because the PBD implemen- 
tation we have been using ignores edges for the first cut. This is important 
because, as explained in [4], taking edges into account for the first cut, usu- 
ally yields very poor partitions. However, in some cases this is not true. For 
the Wing and Store mesh, taking edges into account during the first cut can 
yield very large performance improvements for A ^ 1. 



5 Random Graphs 

Parametric Binary Dissection was also run on two randomly generated graphs. 
The first graph has 100000 nodes and 500287 edges. We obtain performance 
improvements at depths > 6 for all but the lowest values of A. 



X=4 

2 

1 

2 _1 


2- z 


2~ 3 

2-4 


2~ 5 

2-6 


0 


8 



The second random graph has 100000 nodes and about 2.5 million edges. 
Except for A = 2 -6 , there is performance improvement for all values of 
lambda, though not as great as in the smaller random graph. The improve- 
ment saturates towards a smooth curve above A = 0.5. 


Random Graph n-100000 e-2500458 



depth 
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6 Conclusions 

In this paper we have presented a brief and by no means exhaustive evalu- 
ation of Parametric Binary Dissection (PBD) on unstructured meshes. Our 
experimental results indicate that the performance of PBD is highly problem 
dependent but that it can often provide very good improvements over plain 
binary dissection. For meshes based on aircraft, the position of the wings can 
be a troublesome problem but can be overcome to some extent by randomly 
rotating the mesh points. Very good performance can sometimes be obtained 

by using a slightly modified variant of the algorithm, as explained in Section 

4.1. 

The PBD algorithm is very fast and it is feasible for the practitioner to 
try out several partitions to choose the best one for his or her application. 
It should also be recognized that it may be better to run the problem on a 
smaller number of processors, if a very good partition has been obtained for 
a depth lower than the maximum depth possible. For example, in Section 

4.1, given a 32 processor system, and A = 2, it would be preferable to use 
only 16 of these processors. This is because the depth 4 partition is 7 times 
faster than the depth 5 partition, while doubling the number of processors 
can at most halve the time. 
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